Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python multiple inheritance #693

Merged
merged 1 commit into from
Jun 12, 2017
Merged

Python multiple inheritance #693

merged 1 commit into from
Jun 12, 2017

Conversation

jagerman
Copy link
Member

This builds on @dean0x7d's #679 to support multiple inheritance of C++ classes from Python. This required no dispatch changing at all, but does require some changes to be able to store multiple C++ base objects in a single instance.

This is done by changing the instance layout to contain a single void** which is allocated to the correct size and contains value/holder pointer pairs. The value pointers are themselves each set to newly-allocated memory of the size required for each value, and the holders are initialized to nullptr.

Thus the instance changes from this:

  • ... python stuff ...
  • pointer to VALUE data
  • weakrefs *
  • bools
  • HOLDER

to:

  • ... python stuff ...
  • pointer to value/holder data: [[VALUE1 *][HOLDER1 *][VALUE2 *][HOLDER2 *]...]
  • weakrefs *
  • bools

where each VALUEn pointer is to newly allocated memory of the size required by the value type, and each HOLDERn starts out set to nulptr.

This has three effects: First, all instances have the same size (more on this below). Second, instance can store multiple base values and holders for those base values. Third, we don't need the instance_essentials/instance<HolderType> distinction anymore: there is now just one instance which can contain any number of values and holders of whatever size.

Structure aside, a new get_all_type_info() function is added which takes a type and returns a vector of all the pybind types that make up the input type. For types that are already a pybind type, or that inheriting just one pybind-registered type, this is just that the typeinfo for that type. For multiple inheritance, this will be a list.

A companion function, get_value_and_holder(), contains the logic of using the get_all_type_info() to extract the correct references to the value and holder pointers for a given type from the instance.

As for method dispatching: we don't have to do anything. Python takes care of the MRO. (Cross-class overloads won't work, but that's true in Python in general). The generic type caster now knows how to extract the correct value from a multi-valued instance, and an invoked constructor can get the correct holder type/pointer by simply using the parent of the constructor itself (rather than the instance itself, which may have multiple holders).

.so size increases slightly (~1.2%, or about 20KB, on the full test suite compared to #679 under g++6). I think it's worthwhile (and I expect a good chunk of that increase is constant).

Currently this fails under PyPy: I haven't investigated in detail, but will do so.

Some other comments:

  • since we don't store the holder_type directly in the instance anymore (instead it's in the pointer of pointers), we don't need Fix multiple inheritance with static properties #679's one-base-per-instance-size (because there is now just one instance size): I've replaced it with just one base for all pybind11 types, instead of one per instance size. It's also now constructed in the get_internals() initialization rather than on demand.

  • the holder_constructed flag isn't needed anymore: a nullptr holder pointer now indicates a non-constructed holder. instance_size is similarly dropped.

  • the recursive loading in the generic type caster (and mirrored in the copyable holder caster) isn't needed anymore (the get_all_type_info/get_value_and_holder logic handles python base class traversal now).

@wjakob
Copy link
Member

wjakob commented Feb 23, 2017

WOW!! -- I'm excited to see that it's possible to see that it's possible to pull this off with a relatively small patch (which I'll need to digest a bit more, but that's the high-order bit). I'm wondering if we should release version 3.x pretty soon given the rate of cool stuff that you and @dean0x7d are contributing. :)

@dean0x7d
Copy link
Member

dean0x7d commented Feb 23, 2017

Cool :) I haven't looked into the code yet, but the base type simplification is very nice.

About PyPy: Not sure if this is helpful or not, but from my investigation, PyPy is sensitive to the order of base definition, i.e. class MI(B1, B2) vs. class MI(B2, B1). It's either a bug which would need to be fixed in PyPy or perhaps something that could be worked around in the metaclass' tp_new.

@jagerman
Copy link
Member Author

Rebased to master.

@wjakob
Copy link
Member

wjakob commented Feb 23, 2017

FYI: PyPy has quite nice nightly builds that you can download for all sorts of platforms (so no need to debug indirectly via the CI bots)

@jagerman
Copy link
Member Author

This looks like a PyPy bug: tp_bases seems to only ever be populated with the first inherited type; I pushed a change (to be reverted before merging) to dump the tp_bases content, under which this script:

basetest.py:

from __future__ import print_function
from pybind11_tests import Base1, Base2, debug_tp_bases
class A(object):
    pass
class A2(object):
    pass
class A3(A, A2):
    pass
class B(Base1, Base2):
    def __init__(self):
        Base1.__init__(self, 1)
        Base2.__init__(self, 2)
class C(A, B):
    def __init__(self):
        B.__init__(self)
print("A.__bases__:", A.__bases__)
debug_tp_bases(A)
print("A3.__bases__:", A3.__bases__)
debug_tp_bases(A3)
print("B.__bases__:", B.__bases__)
debug_tp_bases(B)
print("C.__bases__:", C.__bases__)
debug_tp_bases(C)

Python 2 and 3 both print (this is Python 3—the Python 2 output is almost identical—the only difference is that it lists type 'object' instead of class 'object' on the first line):

A.__bases__: (<class 'object'>,)
Type `A' tp_bases:
    - object
A3.__bases__: (<class '__main__.A'>, <class '__main__.A2'>)
Type `A3' tp_bases:
    - A
    - A2
B.__bases__: (<class 'pybind11_tests.Base1'>, <class 'pybind11_tests.Base2'>)
Type `B' tp_bases:
    - pybind11_tests.Base1
    - pybind11_tests.Base2
C.__bases__: (<class '__main__.A'>, <class '__main__.B'>)
Type `C' tp_bases:
    - A
    - B

but PyPy prints:

A.__bases__: (<type 'object'>,)
Type `A' tp_bases:
    - object
A3.__bases__: (<class '__main__.A'>, <class '__main__.A2'>)
Type `A3' tp_bases:
    - A
B.__bases__: (<class 'pybind11_tests.Base1'>, <class 'pybind11_tests.Base2'>)
Type `B' tp_bases:
    - Base1
C.__bases__: (<class '__main__.A'>, <class '__main__.B'>)
Type `C' tp_bases:
    - A

Note how tp_bases only contains only the first parent in every case.

I'll report upstream.

@wjakob
Copy link
Member

wjakob commented Feb 23, 2017

Ok -- I don't think this needs to be a blocker, we can just skip these tests on PyPy (as is already done for a few others)

@jagerman
Copy link
Member Author

It wasn't indirect (I'm using the current pypy nightly locally to test)—I just wanted to easily submit concrete test code to the upstream bug report.

@wjakob
Copy link
Member

wjakob commented Feb 23, 2017

ah, ok -- makes sense

@jagerman
Copy link
Member Author

@jagerman
Copy link
Member Author

We could probably work around it by getting __bases__ instead of using tp_bases, but getting an upstream fix for tp_bases seems like the slightly cleaner option.

@jagerman
Copy link
Member Author

Also submitted upstream: https://bitbucket.org/pypy/pypy/issues/2482/cpyext-tp_basicsize-only-considers-first - type->tp_basicsize is too small for a python type using MI when the first inherited type is a pure python type. (PyPy is only inheriting the tp_basicsize from the first base type).

I pushed a commit with workarounds for both issues (which can be removed once the upstream issues are addressed).

Copy link
Member

@dean0x7d dean0x7d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tradeoff with this design is that it adds instance creation/destruction overhead to all types, even when no multiple inheritance is involved. There's an extra 2 * sizeof(void*) memory per instance (which isn't that much), but the number of allocations per instance goes from 2 to 4*, which might be a concern -- allocations are slow and it spreads the same amount of data over more addresses, hurting memory locality. Of course, this could actually be insignificant compared to the Python-side overhead. Perhaps it would be worth benchmarking?

An alternative might be to only use the new allocation scheme for MI types, but keep the original for simple types.

(*) The original two allocations are the PyObject instance itself and the C++ object. The two new ones are for void **values_and_holders and holder_type.

void *&value, *&holder;
value_and_holder(void *&v, void *&h) : value{v}, holder{h} {}
template <typename V> enable_if_t<std::is_pointer<V>::value, V> value_as() const { return reinterpret_cast<V>(value); }
template <typename H> enable_if_t<std::is_pointer<H>::value, H> holder_as() const { return reinterpret_cast<H>(holder); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SFINAE might be overkill here since it isn't helping with overload resolution. Consider a simple static_assert instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; a static_assert more accurately describes what I was going for here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Replaced it entirely: it is now renamed value_ptr<V> and returns a V* instead; the holder version similarly returns a reference instead of a pointer.)

* (or multiple objects, for Python-side inheritance from multiple pybind11 types), but doesn't call
* the constructor -- an `__init__` function must do that.
*/
extern "C" PYBIND11_NOINLINE inline PyObject *pybind11_object_new(PyTypeObject *type, PyObject *, PyObject *) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PYBIND11_NOINLINE attribute isn't really needed since this function is never used directly (only its address is taken) so it will never even be considered for inlining. Same applies to all PyTypeObject functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right; that's a leftover of an earlier implementation detail (that did call it) that I ended up reverting; I'll remove it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Fixed).

auto name = "pybind11_object_" + std::to_string(instance_size);
auto name_obj = reinterpret_steal<object>(PYBIND11_FROM_STRING(name.c_str()));
inline PyObject *make_object_base_type(PyTypeObject *metaclass) {
auto name_obj = reinterpret_steal<object>(PYBIND11_FROM_STRING("pybind11_object"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to extract this and the tp_name below into a single constexpr auto *name = "pybind11_object";.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Fixed).

do {
for (size_t i = 0; i < check.size(); i++) {
type = check[i];
if (!PyType_Check((PyObject *) type)) continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only for old-style Python 2 classes? If it is, a comment might be nice. (Or even #if PY_VERSION_MAJOR < 3?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that yes, it only applies to Python 2, old-style classes. (This same logic was essentially there before, though rather indirectly).

@wjakob
Copy link
Member

wjakob commented Feb 26, 2017

I'd like to do a pass (hopefully later this coming week) before this issue is merged.

@jagerman jagerman changed the title Python multiple inheritance WIP - Python multiple inheritance Feb 26, 2017
@jagerman
Copy link
Member Author

@wjakob - yes, definitely, I think there's more work to do here, and suspect there will be some bigger changes needed here. So feel free to hold off on reviewing it for a while until I have a chance to work on this some more.

The mallocs I definitely have to think about some more. It is slower (a rough benchmark: building a list of 1 million instances of very simple pybind types goes from about 0.8 seconds to 1.2 seconds).

There are actually more than just the extra mallocs in the instance: the two vectors in get_all_type_info() are also problematic--and, I think, are actually a bigger performance issue.

I won't have time to look at this for the next few days, but I have some ideas to explore and will keep playing around to see how I can reduce the performance impact.

I'm a bit hesitant to restrict the support to only non-simple types if it can be avoided without too great a cost; the simple/MI spit is currently a nearly-automatic background detail (except in the case where you have multiple inheritance not exposed to python and need py::multiple_inheritance). Making it required for Python-side MI would bring it very much front and centre: in order to inherit multiply from Python, you would also have to bind with py::multiple_inheritance(), but now instead of being an annotation needed to let pybind know the C++ layout, it would be about enabling a feature on the Python side. I don't like that: but, of course, I don't like a 50% performance hit, either. I'll take some time first to investigate this to see if I can bring down the performance hit.

@aldanor
Copy link
Member

aldanor commented Feb 26, 2017

@jagerman Re: vectors, by looking briefly at get_all_type_info(), it looks like the resulting vector is only used in two ways downstream:

  • iterate over it
  • check that its empty
  • check that its size != 1

In this case, why can't it return a range / iterable without having to alloc the vector?

@wjakob
Copy link
Member

wjakob commented Feb 26, 2017

Ok, that sounds good. A related issue: I'm wondering if it makes sense to set up some kind of performance/binary size monitoring longer term. It would be quite informative to have a status message that says: this PR makes an example project (not necessarily the full test suite) 10% slower and 5% larger :). It might be tricky to do the performance measurement on Travis, but I would be happy to volunteer cycles on my lab's Jenkins CI bot. (https://rglpc1.epfl.ch/jenkins/)

@jagerman
Copy link
Member Author

@aldanor - yeah, that's probably a better approach.

@jagerman jagerman self-assigned this Feb 26, 2017
@aldanor
Copy link
Member

aldanor commented Feb 26, 2017

// Absolutely +1 re: benchmarks; it's a bit of a pain to try and construct manual benchmarks from scratch when hacking on internals. Ideally, a whole suite of benchmarks with nice reporting, there's a few nice solutions out there if it's Python-based, e.g. pytest-benchmark.

@dean0x7d
Copy link
Member

airspeed velocity (used by scipy) may be an interesting solution for continuous performance tracking.

@jagerman
Copy link
Member Author

jagerman commented Mar 2, 2017

I've just pushed a change that gets rid of the performance issues.

Quick summary of the main changes:

  • base type info information is now done via an iterator that only allocated vectors when traversing the base types hits multiple inheritance. Thus the common case (a type which is already a pybind11 type, or inheriting from one via a single-inheritance tree) avoids unwanted allocations.

  • I amended the instance storage structure to be a union of two possible layouts:

    • a "simple" layout is a fixed-size allocation (typically the size of three pointers) that contains the value pointer followed by a single holder. It has enough space to be able to hold either a std::unique_ptr or std::shared_ptr, thus reducing the number of mallocs in the common case back to what they were before. It wastes one pointer-sized allocation for std::unique_ptr (which only needs one pointer-size block), but it seemed worthwhile to allow other pointers like shared_ptr (which needs two) to be able to take advantage of the simple layout.
    • When using python-side multiple inheritance, or single inheritance with a custom holder type which is larger than a shared_ptr, the new version allocates a [[bb...][v1*][H1][v2*][H2]...] block, where [H1] is now the holder instance (not a pointer as it was earlier), and the [bb...] are the holder_constructed bits for the multiple holders. The multiple inheritance case thus also reduces the number of allocations from the initial implementation by keeping the instances in the allocated space. This main block (but not the [v1*] pointers) is now also allocated/freed via python, which may improve performance in some situations (most notably, I would expect, in tight create/destroy loops).
  • I replaced/renamed dealloc to destroy_holder, and had it now only do a holder destruction (instead of conditionally doing holder destruction or pointer deallocation). The pointer delete is instead now done in class_support.h, closer to the associated new.

I did not implement a performance test suite :) but I did some profiling of the resulting code, and found the the overhead is significantly reduced compared to the initial version--it's now only single-digit-percentage-points slower on creating millions of tiny instances, rather than ~50% overhead.

@jagerman
Copy link
Member Author

jagerman commented Mar 2, 2017

@wjakob - I think this is good to be reviewed now, when you have a chance.

@jagerman jagerman changed the title WIP - Python multiple inheritance Python multiple inheritance Mar 2, 2017
* of that class. If not using Python-side multiple inheritance, this iterator iterates at most
* once.
*/
class type_info_iterator {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this bears only a passing resemblance to an iterator. It's quite heavyweight:

struct type_info_iterator {
    type_info *typeinfo;
    std::shared_ptr<std::vector<type_info *>> all;
    size_t size;
    size_t position;
}

Perhaps replace it with:

struct all_type_info {
    type_info *typeinfo;
    std::vector<type_info *> all;
}

A default-constructed std::vector with capacity() == 0 has similar complexity to an empty std::shared_ptr + 2 * size_t. The advantage is that most of the custom iterator logic can be removed:

auto ti = all_type_info(...);
if (ti.all.empty()) {
    // use `ti.typeinfo` as the only type
} else {
    for (auto item : ti.all) {
        // iterate over all types
    }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest push addresses this basically as you suggested, with a much simpler iterator implementation (i.e. the complex logic is moved to the container class rather than implemented in the iterator).

size_t position = 0;
// Loads bases from a multiple inheritance type; takes the first tuple of multiple bases
// encountered.
void load_mi_type_info(const tuple &init_bases) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the order of bases important? If not, a recursive solution might be a bit more compact and easier to follow:

void add_cpp_bases(std::vector<type_info *> &all, handle bases) {
    for (auto base : reinterpret_borrow<tuple>(bases)) {
        if (!PyType_Check(base.ptr())
            continue;

        auto type = (PyTypeObject *) base.ptr();
        auto it = type_dict.find(type);
        if (it != type_dict.end())
            all.push_back((detail::type_info *) it->second);
        else if (type->tp_bases)
            add_cpp_bases(all, type->tp_bases);
    }
}
std::vector<type_info *> all;
add_cpp_bases(all, type->tp_bases);

std::sort(all.begin(), all.end());
all.erase(std::unique(all.begin(), all.end()), all.end());

Copy link
Member Author

@jagerman jagerman Mar 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason it's like it is was to avoid any stl header addition here.

The order of bases needs to be consistent (since it determines where things are stored in the values_and_holders data), but beyond that the specific ordering is not important (i.e. ordering by pointer is fine).

std::sort and std::unique both add a dependency on <algorithm>, which core pybind doesn't have. I think that, actually, a plain std::set would accomplish this even better (i.e. changing all to be an std::set instead of a vector), but that would also add a stl dependency (on <set>). I'm fine with adding it (in fact, I'd prefer it, though it's not a huge concern), but I know @wjakob has had some reluctance to adding stl headers before, hence my avoiding it in the PR.

* bound C++ classes as parents. Under this layout, `values_and_holders` is set to a pointer to
* allocated space of the required space to hold a holder-constructed bitset followed by a
* sequence of value pointers and holders, i.e. [bbb...][val1*][holder1][val2*][holder2]...
* where each [block] is rounded up to a multiple of `sizeof(void *)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic surrounding the non-simple layout is explained well, but it's still a bit difficult to follow everywhere in the code. Would it be possible to encapsulate all of it in a class? E.g. a complex_layout class which maintains all the invariants of that layout, proper value/holder lookup and destruction. Then replace the above void **values_and_holders with void *complex_layout. That might be an extra allocation, but only for complex types and I'd favor code clarity over performance for those (in contrast to the omnipresent simple-layout types where performance is a concern).

I'm just spitballing here, but another option might be to have some kind of metaholder type which would have the required logic for MI types created in Python. This would revert to the instance_essentials / instance split, where an instance<metaholder> would be created as needed. In this case, metaholder would encapsulate most of the logic required for MI types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That encapsulation is exactly what the value_and_holder struct is for. As such, only three places right now deal with the layout: class_support.h new/dealloc, the allocation code in cast.h's many-argument cast() (which doesn't use new to avoid the unwanted instance pointer allocation, but also can be simpler because it knows there is only one base class); and the get_value_and_holder() logic itself. No other code should ever need to touch it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I redesigned this a bit in the code just pushed to abstract away the storage details into methods added to instance (implemented in cast.h). This cleans it up a lot: calling code now simple deals with value_and_holders which abstract the storage details.

@wjakob
Copy link
Member

wjakob commented Mar 10, 2017

Somewhat related: I've added a benchmark project and a benchmark runner based on airspeed velocity (see https://github.com/pybind/pybind11_benchmark and https://github.com/pybind/pybind11_benchmark_runner/tree/master/benchmarks). Right now there is just a simple test based on the Collatz conjecture. I'm running this for all commits so far and will post a link to the results once it's done. Other benchmarks are welcome.

@jagerman
Copy link
Member Author

I've rebased this to current master. I'd like to merge this PR relatively soon, so that it can get some testing on master well before a 2.2 release. If it causes problems we can of course revert it, but I think it's in pretty reasonable shape.

Copy link
Member

@wjakob wjakob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great, just a few minor comments. My only potential concern is the performance cost of the extra bookkeeping -- I assume that this is a non-issue according to your previous message, correct?

* that doesn't use python-side multiple inheritance, this will be a single class; if the class has
* multiple inheritance it could potentially be more.
*/
struct all_type_info : all_type_info_data {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious: what is the reason for the separation between all_type_info and all_type_info_data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common.h needs to know the size and alignment of all_type_info_data to allocate space for it, but the various methods need things not defined in common.h. Separating the members from the member functions seemed nicer than needing to forward-declare a bunch of things.

}

// Allocate space for flags, values, and holders:
values_and_holders = (void **) PyMem_New(void *, space);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to prefer PyMem_New over more standard C++ memory allocation routines?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python's PyMem_New wraps malloc itself, but also keeps its own pool of free memory, so it has the potential to be closer in memory to the PyObject itself. It may perform better in a tight allocate-deallocate loop (with Python reusing memory rather than needing to dealloc/malloc). It doesn't mix with C++ allocation via new/delete, but in this case, where we just need a small chunk of memory that is controlled entirely within pybind, it seems appropriate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: pymalloc's arena allocator is the default for PyMem_New in Python 3.6; in earlier versions I think PyMem_New is basically just a wrapper around malloc.

}
bool holder_constructed() const {
if (inst->simple_layout) return inst->simple_holder_constructed;
return reinterpret_cast<std::uintptr_t>(inst->values_and_holders[index / bits_per_ptr])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't follow why the indexing computations are at bit granularity, rather than at byte granularity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was to potentially save space, but given that it's likely to be any different until you inherit from more than 8 different pybind classes, that was perhaps premature optimization. I'll simplify it to byte-level.

@jagerman jagerman force-pushed the python-mi branch 3 times, most recently from 1770ca8 to c66eddc Compare May 27, 2017 02:32
@wjakob
Copy link
Member

wjakob commented Jun 1, 2017

Is there anything left to do here? Otherwise I'd suggest merging it to leave sufficient time for stabilization.

@dean0x7d
Copy link
Member

dean0x7d commented Jun 1, 2017

Sorry, I haven't been following the latest changes. One thing that strikes me now is the addition of all_type_info_storage to instance. That wasn't there in the previous version of this PR, but, assuming a 64-bit machine, it adds an additional 32 bytes to sizeof(instance). This is in addition to the +8 bytes for shared_ptr storage. This is added to all instances, regardless of MI or holder type. So even the most basic single-inheritance/default-holder type goes up from 64 bytes to 104 bytes per instance (+62% increase) which seems a bit much. If I recall correctly this was down to just +8 bytes in a previous version of the PR (all_type_info_storage wasn't there), or am I misremembering?

@jagerman
Copy link
Member Author

jagerman commented Jun 1, 2017

Agreed, the instance size increase isn't all that desirable. I've got some code nearly ready that removes it without sacrificing performance; I need to test it a bit more, but should have it ready soon.

@dean0x7d
Copy link
Member

dean0x7d commented Jun 7, 2017

I think that all_type_info and specifically load_mi_type_info() could be significantly simpler by iterating over the MRO list, similar to class_support.h#L452. Python already guarantees that the types in MRO are unique and it includes the entire chain so there wouldn't be any need to repeatedly look up bases.

@jagerman
Copy link
Member Author

jagerman commented Jun 7, 2017

It can indeed be made much simpler (and I've done that in the code I'll push shortly). The main reason for not using the MRO list is that I don't want to go beyond any registered bases. E.g. if I have registered C++ classes A and B, B inheriting from A, with Python classes PyA and PyB, I want the list for PyC(PyB) to include PyB, not PyA. For single inheritance, I could just stop at the first registered base in the MRO list, but for multiple inheritance that won't work.

Anyway, I've greatly simplified it to make all_type_info() a function returning a vector& of registered bases, with that vector cached per Python type. It adds an unordered_map lookup, but avoids requiring a new allocation or base class traversal on every cast, which is a pretty big gain.

@jagerman
Copy link
Member Author

jagerman commented Jun 7, 2017

The one thing holding me up are these failures: https://travis-ci.org/jagerman/pybind11/jobs/240213208

They go away if I add a detail::get_internals_ptr() = nullptr; to finalize_interpreter(), but it isn't clear to me why that is needed--somehow the builtin pointer and the static pointer are out of sync. I'm investigating now.

Edit: fixed by #892.

@jagerman
Copy link
Member Author

jagerman commented Jun 7, 2017

I've got this whittled down now to eliminate the overhead added to the instance by instead caching the base types for each python type encountered in registered_types_py (and using a weakref to delete the cache entry if the python type itself is destroyed).

This simplifies a lot: the all_type_info struct is gone, replaced with an all_type_info() function that returns a vector lvalue reference of base types.

I also rewrote some of the dispatch code to not bother with an all_type_info lookup in certain cases—such as an exact type match. There's still a slight overhead for calling an inherited function, but the overhead now consists solely of the cache lookup (assuming the type has been seen once before); no base type traversal.

Another nice thing here: I combined the type_caster_generic and copyable_holder_caster logic into a single function. The duplication of nearly identical logic between the two has bothered me for a while; this eliminates it by templating the implementation function so as to allow a virtual-like dispatch for the few parts that differ without actually using virtual inheritance. (It does mean each distinct copyable_holder_caster<T, H> gets its own implementation, but that was the case before, too: this is all about reducing the code duplication).

This relies on #892 (the weakrefs trigger the failure in the embedding tests). I've temporarily included that commit in this PR to get the tests to pass.

@jagerman jagerman force-pushed the python-mi branch 3 times, most recently from 4856993 to f570e36 Compare June 8, 2017 19:43
Copy link
Member

@dean0x7d dean0x7d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall. I left some minor comments, no showstoppers.

👍 for removing the code duplication in type_caster_generic and copyable_holder_caster.

return reinterpret_cast<V *&>(vh[0]);
}
// True if this `value_and_holder` has a non-null value pointer
operator bool() const { return value_ptr(); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicit?

instance *inst;
using vec_iter = std::vector<detail::type_info *>::const_iterator;
vec_iter typeit;
size_t end;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this member variable isn't being used anywhere other than the constructor.

// values that tracks whether each associated holder has been initialized. Each [block] is
// padded, if necessary, to an integer multiple of sizeof(void *).
size_t space = 0;
size_in_ptrs(n_types);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line doesn't appear to do anything.

#else
nonsimple.values_and_holders = (void **) PyMem_New(void *, space);
std::memset(nonsimple.values_and_holders, 0, space * sizeof(void *));
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyMem_* can return nullptr. Throw std::bad_alloc?

@@ -559,7 +527,8 @@ inline PyObject* make_new_python_type(const type_record &rec) {
type->tp_name = strdup(full_name.c_str());
type->tp_doc = tp_doc;
type->tp_base = (PyTypeObject *) handle(base).inc_ref().ptr();
type->tp_basicsize = static_cast<ssize_t>(rec.instance_size);
type->tp_basicsize = static_cast<ssize_t>(sizeof(instance));
//type->tp_itemsize = 16;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover commented out code.


// Forward declarations
struct type_info;
struct all_type_info;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a struct anymore.

void *simple_value_holder[1 + instance_simple_holder_in_ptrs()];
struct {
void **values_and_holders;
unsigned char *holder_constructed;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not bool*? Strict aliasing concerns?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually more concerned by the "sizeof(bool) could be != 1", while unsigned char is guaranteed to be 1 (and thus I can avoid sizeof(bool) complications, which look a bit silly).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, the only required change would be:

space += size_in_ptrs(n_types * sizeof(bool));

If the rest is managed via bool * there shouldn't be any need to use sizeof(bool) anywhere else.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; I'll switch it over.

class MI1(Base1, Base2):
def __init__(self, i, j):
Base1.__init__(self, i)
Base2.__init__(self, j)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of error is raised if someone tries to use a C++ base method but accidentally forgot to call __init__ for that C++ base? Would be nice to cover it with a test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently nothing is raised--the space is allocated, but the constructor isn't invoked so you get UB. I'm not sure what we should do about it, or how we can deal with it.

This isn't actually specific to multiple inheritance, though: the same thing happens if you do:

class PyDerived(CppBase):
    def __init__(self):
        pass

a = PyDerived()
a.some_cpp_method()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's a separate issue then. Never mind.

This commit allows multiple inheritance of pybind11 classes from
Python, e.g.

    class MyType(Base1, Base2):
        def __init__(self):
            Base1.__init__(self)
            Base2.__init__(self)

where Base1 and Base2 are pybind11-exported classes.

This requires collapsing the various builtin base objects
(pybind11_object_56, ...) introduced in 2.1 into a single
pybind11_object of a fixed size; this fixed size object allocates enough
space to contain either a simple object (one base class & small* holder
instance), or a pointer to a new allocation that can contain an
arbitrary number of base classes and holders, with holder size
unrestricted.

* "small" here means having a sizeof() of at most 2 pointers, which is
enough to fit unique_ptr (sizeof is 1 ptr) and shared_ptr (sizeof is 2
ptrs).

To minimize the performance impact, this repurposes
`internals::registered_types_py` to store a vector of pybind-registered
base types.  For direct-use pybind types (e.g. the `PyA` for a C++ `A`)
this is simply storing the same thing as before, but now in a vector;
for Python-side inherited types, the map lets us avoid having to do a
base class traversal as long as we've seen the class before.  The
change to vector is needed for multiple inheritance: Python types
inheriting from multiple registered bases have one entry per base.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants