Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for implicit conversions to C++ types #264

Closed
wants to merge 5 commits into from

Conversation

jagerman
Copy link
Member

@jagerman jagerman commented Jul 3, 2016

This PR adds a new py::implicitly_cpp_convertible<From,To>() tells pybind() that From can be implicitly converted to To (in C++), and thus pybind11-registered From types can be used in methods that accept (C++) To instances.

(See issue #259)

In essence, this lets you make use of C++ implicit converters (typically via a 'From::operator To()' or a non-explicit 'To::To(const From&)' constructor) with pybind11 instances without needing to create an explicit pybind11 interface for the conversion.

As a simple example, consider the two classes:

    class A {
        ...
        void method(uint64_t val) { ... }
    }
    class B {
        ...
        operator uint64_t() { ... } // returns unique identifier
    }

In C++, you can call a.method(22) or a.method(b). Telling pybind11 about the conversion allows you to do the same in python.

Without the implicit conversion, you would have to either overload A.method to accept both uint64_t and B instances, or would have to add a method to B's interface that returns the id, then pass this method result to A's method.

@wjakob
Copy link
Member

wjakob commented Jul 3, 2016

Sorry, I must be missing something. How is this different from the existing py::implicitly_convertible?

@jagerman
Copy link
Member Author

jagerman commented Jul 3, 2016

py::implicitly_convertible only allows conversion to types declared by the user. It also allows conversion between types aren't, at the C++ level, implicitly convertible.

This only allows conversion from types declared by the user, but allows conversion to any type that the C++ source type can be implicitly converted to.

@wjakob
Copy link
Member

wjakob commented Jul 3, 2016

A few thoughts: It's probably not a good idea to put implicitly_convertible checks into every type_caster (that's quite a lot of places! I am concerned about code generation bloat + maintainability issues).

As far as other places go, the tuple caster comes to mind (any function call will pass through that to convert the tuple of function arguments).

Also, it seems somehow inelegant to me to have two kinds of implicitly_convertible statements. It would be cleaner to have one unified declaration that subsumes both cases.

@jagerman
Copy link
Member Author

jagerman commented Jul 3, 2016

It's probably not a good idea to put implicitly_convertible checks into every type_caster (that's quite a lot of places! I am concerned about code generation bloat + maintainability issues).

Yes, moving into the tuple caster seems like a better approach. I'll do that.

Also, it seems somehow inelegant to me to have two kinds of implicitly_convertible statements. It would be cleaner to have one unified declaration that subsumes both cases.

I started out with updating implicitly_convertible, but I ended up doing two cases because the meanings are a bit different. py::implicitly_convertible<A,B> is essentially saying that when you write (in python) func(a) we'll translate that to func(B(a)) if func accepts B instances but not A instances. B(A) must be a registered constructor and will be used for the conversion.

The cpp one is a little different because it allows implicit conversion behind-the-scenes in almost any way that C++ allows it, which can be via a constructor in B, but can also be done through an operator B() in class A. What that means in practice is that py::implicitly_cpp_convertible<A,B> allows me to write func(a) in python even if I haven't exposed a B(A) constructor in Python, as long as the constructor (or some other implicit converter) exists in C++. Without that, you have to expose any non-constructor implicit conversion operators from A -> B as B constructors.

Now maybe this isn't important enough to worry about. We could say that implicitly_convertible<A,B> means something like this:

  • if B is not a pybind11-registered type: A must be a registered type, and must be implicitly convertible (in C++) to B. Any attempt to pass an A instance as an argument that doesn't accept A instances will try to do the implicit conversion, and only fail with an argument error if there is no such conversion.
  • otherwise the (registered) B type will be created as needed when passing some unsupported type Z to a function, but a (declared) B(Z) constructor exists.

As for exposing conversion operators, that's probably not too difficult anyway: something like .def("__init__", [](const A &a) { a.operator B(); }) ought to accomplish that.

@@ -321,6 +344,7 @@ struct type_caster<
bool load(handle src, bool) {
py_type py_value;

// First we try to convert the value directly to a T:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this left over from a previous patch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, yup. Fixed.

@wjakob
Copy link
Member

wjakob commented Jul 5, 2016

This patch increases the size of generated example module by ~17% (when excluding test case 18, which you added). I've been working very hard to shave off fractions like these in the last releases, so this concerns me somewhat.. 😰

I wonder if there could be a simple way to reduce the overhead?

I'm assuming that the problem was caused by moving the conversion code the std::tuple<> type caster. This made the patch less intrusive and easier to understand. The downside is that the tuple template is instantiated for literally every function definition, so those few lines you added translate in quite a bit of extra object code for every function call (vs type, of which there are many fewer).

From a software maintainer perspective, the previous approach was less nice because every type_caster declarations had to be modified to support implicit conversions.

Just a thought: how about augmenting PYBIND11_TYPE_CASTER so that it inserts a separate function into every type caster (e.g. load_or_convert) which falls back to an implicit conversion upon failure?

Generally the more you can do in templates that are instantiated fewer times the better, and ideally most of the code is in functions that are not templated at all.

Thoughts?

@jagerman
Copy link
Member Author

jagerman commented Jul 5, 2016

Some of that increase is coming from making type_caster_generic::load virtual, which was needed to abstract the try_implicit_conversion (since it needs to call load, but needs to go to the derived instance). This is only called in two places, however, so simply inlining that bit of code and de-virtualizing load() (in c1a9ecd) reduces the example so size for me by about 5.5%. (I see a total increase of around 25%, including the example18).

I'll investigate some more changes (including the TYPE_CASTER change) to see what else can be done here.

@jagerman
Copy link
Member Author

jagerman commented Jul 6, 2016

I implemented your suggested change (moving everything to the individual type_casters, mostly picked up via PYBIND11_TYPE_CASTER, with a load_or_convert in type_caster), but the change in so size was very minimal: it reduces the so size increase to 13.4% (over current master) from 14.3% (both counted with example18 disabled), and with the eliminated virtual I pushed in c1a9ecd.

I've commited the change in 408eb61 (not currently pushed to this PR) if you want to take a look, because I'm not sure it's worthwhile.

A slight savings, certainly, but the implementation is certainly less elegant, and more intrusive (it's definitely heading back in the direction of the original implementation). It's also a bit more fragile: for example, subclasses of PYBIND11_TYPE_CASTER-using classes with their own load() method need to provide their own load_or_convert method, so as to avoid the parent's load_or_convert calling the parent's load().

@wjakob
Copy link
Member

wjakob commented Jul 6, 2016

A question about your new patch: why not use the same storage object for the implicitly converted object? A boolean flag could be used to indicate whether an implicit conversion took place (i.e. for lifetime management). This would simplify all the operator type() functions quite a bit I think.

@jagerman
Copy link
Member Author

jagerman commented Jul 6, 2016

See the latest (d2b6719). This disables implicit casting entirely if a new "implicit.h" header hasn't been included, keeping the .so size unchanged when code doesn't need it, while still providing the feature when desired. (see comments below)

@jagerman
Copy link
Member Author

jagerman commented Jul 6, 2016

As for the other patch: PYBIND11_TYPE_CASTER sets up a Type value, if Type is trivially destructible, we could stash the implicit converted value there with an equivalent of new (&value) OutputType(...). If not trivially destructible, though, that's not safe--so either we'd have to have Type *value for everything, or a mix depending on whether Type is trivially destructible.

Either way, I don't think it's going to save a whole lot--I can't see eliminating one pointer per type_caster instantiation cutting down the .so size a lot. I think the approach I just pushed is nicer: it's cleaner (general type_casters don't need any support code at all, everything is done by the tuple type_caster), and it only increases the object size for code that actually wants to use the feature.

@dean0x7d
Copy link
Member

dean0x7d commented Jul 6, 2016

I haven't been following the code here closely, but the last change regarding implicit.h looks like an ODR violation if the header is included in some translation units but not others. E.g. there would be two type_caster<Ts...> classes with identical names and identical type parameters but different definitions depending on the inclusion of implicit.h in a translation unit. To avoid undefined behavior, the user would need to ensure to either include the header everywhere or nowhere, but that's an easy mistake to make.

If it should be optional, perhaps a preprocessor define would be a better choice, since it's much easier to enforce for all translation units.

@jagerman
Copy link
Member Author

jagerman commented Jul 6, 2016

Ah yes, you're right, that's not going to work.

@jagerman
Copy link
Member Author

jagerman commented Jul 6, 2016

Any suggestions for how to do it via a macro that wouldn't have exactly the same issue?

@jagerman
Copy link
Member Author

jagerman commented Jul 6, 2016

Oh, I suppose you mean a project-level build macro, rather than a per-translation unit macro.

@dean0x7d
Copy link
Member

dean0x7d commented Jul 6, 2016

Yeah, exactly that.

However, you should probably wait for @wjakob to comment on this since adding a project-level optional feature may be an additional maintenance burden going forward -- it could potentially double the test suite.

For the feature itself, my guess is that the current implementation can stay mostly intact, just with everything from implicit.h moved into cast.h and placed under an #ifdef guard. A project-level define would ensure the same code is included everywhere. Of course, this may also make it possible to simplify the implementation compared to the separate header, but I haven't dug deep enough into the code to make any meaningful comments.

@wjakob
Copy link
Member

wjakob commented Jul 6, 2016

Thanks for chiming in @dean0x7d !

@jagerman: A few more thoughts (sorry to drag this on, but I don't think it's quite ready yet): I think the approach via a separate header file is actually quite nice, though the ABI issue is of course a blocker.

The templated way of potentially adding the extra member and enabling/disabling functions in the tuple caster seems somewhat overkill to me. How about something much more old school? If implicit.h is included, #define PYBIND11_IMPLICIT_CONVERSIONS 1 (set to 0 by default, e.g. in common.h).

Next, add a constant to the template parameters of the tuple caster: int ImplicitConversions = PYBIND11_IMPLICIT_CONVERSIONS. This ensures that the types are different (i.e. no ABI problem).

Finally, a higher level remark: the terminology of just calling this an implicit conversion in the code is a bit confusing, as there is already a kind of implicit conversion (and e.g. all the type_caster::load functions have a template parameter convert which enables/disables just that.)

@jagerman
Copy link
Member Author

jagerman commented Jul 6, 2016

On another note, your suggestion for the load-or-convert branch (reusing the value storage with a bool for lifetime management) actually makes a surprisingly large difference (e9d7b75): the example .so increase is down to around 9% from ~ 13.4% with that applied just to the type_caster_base (since it already works by pointer rather than value).

Is that an acceptable increase, or do you prefer the opt-in approach?

Re: naming, how about changing "implicit_converted" and the like to "implicit_cpp_instances", and similarly for implicit_casters -> implicit_cpp_casters.

@jagerman
Copy link
Member Author

jagerman commented Jul 6, 2016

The reason for the template approach for enable/disabling rather than the old-school approach is that it isn't affected by header include order. With the #define, this won't work:

#include <pybind11/pybind11.h>
#include <pybind11/implicit.h>

which seems undesirable.

@jagerman
Copy link
Member Author

jagerman commented Jul 6, 2016

Err, ignore that, that doesn't actually fix the ODR problem.

// conversion value and moves/copies the converted value into it, returning a pointer.
// Returns nullptr if input is nullptr. The caller is responsible for destruction.
if (!input) return nullptr;
OutputType t = *reinterpret_cast<InputType*>(input);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the extra move -- isn't this enough? return new OutputType(*reinterpret_cast<InputType*>(input));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with it that way earlier, but it was erroneous in that it will invoke explicit conversions as well, and that could give a different conversion path than you would get in regular C++ code.

For example, in the example classes in example18.cpp, Ex18_B is implicitly convertible to Ex18_E uniquely, via its operator Ex18_E(). Without the construct-and-move approach, you (can) also get the temporary Ex18_E via either Ex18_E(b.operator double()), or Ex18_A(static_cast<A&>(b)), but those shouldn't be considered for implicit conversion (and aren't, in C++) because Ex18_E(const double&) and Ex18_E(const Ex18_A&) are both marked explicit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7835704 adds a comment about this, because it is definitely not immediately obvious.

@wjakob
Copy link
Member

wjakob commented Jul 7, 2016

@jagerman: I think 9% is still a lot to pay for a feature which will not be relevant to most users of this libraries. I know that 9% doesn't sound like a lot -- but look at huge binding projects to realize what this means just in terms of tens of megabytes of object code: 86d825f#commitcomment-17741296 So if this is the best that can be done, then that's fine -- however, the feature will have to be optional in this case.

One aspect that I don't yet understand is with regards to the lifetime of created objects.
There is naturally no issue with ints, floats and other trivial types. However, what happens when some instance of class "A" is implicitly converted to an instance of class "B"? As far as I understand, your code will delete the "B" instance after the function call.

However, the Python callee could certainly expect to store one of the function parameters in a variable (e.g. class attribute) and expect it to be alive later on. This was never a problem with the previous set of implicit conversions because they are always tracked by an underlying ref-counted PyObject subclass.

Or am I just missing something?

// Some validity checks on output type:
if (!std::is_destructible<OutputType>::value)
pybind11_fail("implicitly_convertible: " + type_id<OutputType>() + " is not destructible");
if (!std::is_move_constructible<OutputType>::value && !std::is_copy_constructible<OutputType>::value)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible for OutputType to be movable but not copyable, but still support implicit conversion (see example below). We prefer a move constructor, but if that isn't available, fallback to a copy constructor--so this is just checking that at least one of those is available. This if statements (along with the one before it) should also be compiled away entirely by any decent compiler, since the ::value's are constexpr bools.

Example: implicit conversion of a movable, non-copyable type:

class In {};
class Out {
public:
    Out() = default;
    Out(const Out &) = delete;
    Out(Out &&) = default;
    Out(const In&) {}
};

void out(const Out&) { std::cout << "out called\n"; }

int main() {
    In in;
    out(in);
}

@wjakob
Copy link
Member

wjakob commented Jul 8, 2016

Any thoughts about the lifetime issue?

@wjakob wjakob force-pushed the master branch 3 times, most recently from 93871f7 to 00cec38 Compare July 8, 2016 11:47
@wjakob
Copy link
Member

wjakob commented Jul 13, 2016

Ok, good to know! (those conversions could be automatically added when registering types with multiple inheritance)

One more question: is there ever a situation when your patch causes a chaining behavior? I thought I remembered something like that when looking at your testcases, though perhaps I misunderstood.

@wjakob
Copy link
Member

wjakob commented Jul 14, 2016

Looks like MSVC has a problem with the latest version (lots of errors of the type "parenthesized type followed by an initializer list is a non-standard explicit type conversion syntax")

@jagerman
Copy link
Member Author

Whoops, that's not right.

@jagerman
Copy link
Member Author

One more question: is there ever a situation when your patch causes a chaining behavior? I thought I remembered something like that when looking at your testcases, though perhaps I misunderstood.

No, chaining won't happen--the code only looks at implicit conversions that have been registered with the argument type as the output type. So py::i_c<A,B>(); py::i_c<B,C>() won't consider the A->B conversion when looking for a way to convert the arguments to an argument of type C. If would have to be a bit more complicated to allow that to also detect and avoid chaining loops, but it would also be implementing a C++-level implicit conversion beyond what C++ actually implements. There's also a second issue with chaining in that we don't allow implicit conversion between two unregistered types (e.g. A->B, B->C, with both B and C unregistered).

So I suppose that, in order to use this mechanism for multiple inheritance, it would involve checking in implicitly_convertible<A,B> whether std::is_base_of<B, A> and, if so, storing a function that simply does a static_cast<A*>(b) (rather than constructing a new A from the B). It wouldn't need the constructor or destructor, but we could still store the base class pointer in implicit_instances, with a flag to say "don't delete this" during function_call_internals destruction.

@jagerman
Copy link
Member Author

jagerman commented Jul 15, 2016

See 44f20dc (not currently pushed to this PR branch) for an implementation of the base class casting and chaining. It changes py::implicitly_convertible<A,B>() to do base class casting, rather than creating a new instances, when B is a subclass of A. Multiple inheritance isn't a problem (it'll evaluate all possible chains that it's been told about the first time the conversion from F to T is needed, and then cache the resulting chain (or lack thereof)).

@wjakob
Copy link
Member

wjakob commented Jul 15, 2016

Hello Jason,

this patch looks very promising!
It looks like this could be merged with another patch (BorisSchaeling@8158622) which introduces a nice API for declaring multiple inheritance but does not do the casting/chaining correctly.

I will need to set aside a bit more time to go through it in detail -- probably I will not get to it until next week.

Best,
Wenzel

@jagerman
Copy link
Member Author

Squashed into one.

@wjakob
Copy link
Member

wjakob commented Jul 21, 2016

FYI: I'm traveling this next week, so this PR will remain open for a bit. (I need to sit down for a day or so to merge this and the multiple inheritance change and make sure that I don't totally mess things up ;))

@wjakob
Copy link
Member

wjakob commented Jul 24, 2016

I just tried to give this a test drive but ran into a bunch of compilation issues (possibly related to C++14 mode?)
compiler-errors.txt

@jagerman
Copy link
Member Author

Does 0c3925e fix it?

@wjakob
Copy link
Member

wjakob commented Aug 4, 2016

Hmm, looks like this got broken now by some of the other PRs -- sorry :(

@jagerman
Copy link
Member Author

jagerman commented Aug 5, 2016

Rebased and squashed.

See issue pybind#259.

This commit adds support for py::implicitly_convertible<From,To>() to
perform implicit conversion, at the C++ level, when To is not a
pybind11-registered type (e.g. a custom C++ type that should not be
exposed, or a primitive type (double, etc.)).

In essence, this lets you make use of C++ implicit converters (typically
via a 'From::operator To()' or a non-explicit 'To::To(const From&)'
constructor) with pybind11 instances without needing to create an
explicit pybind11 interface for the conversion.

As a simple example, consider the two classes:

    class A {
        ...
        void method(uint64_t val) { ... }
    }
    class B {
        ...
        operator uint64_t() { ... } // returns unique identifier
    }

In C++, you can call `a.method(22)` or `a.method(b)`.  Telling
pybind11 about the conversion allows you to do the same in python.

Without the implicit conversion, you would have to either overload
A.method to accept both uint64_t and B instances, or would have to add
a method to B's interface that returns the id, then pass this method
result to A's method.
Depending on exactly when MSVC decides to inline, this can get triggered
in ->impl when the called code unconditionally throws an exception.  MSVC
is, in that case, right that the code is unreachable, but it's also a
stupid warning (because it *doesn't* happen if MSVC doesn't decide to
inline).
@wjakob
Copy link
Member

wjakob commented Sep 10, 2016

Hi Jason,

I took a full pass over this PR just now (which has been in the works for quite a while at this point).

Having seen it again in its entirety, I'm sorry to say that it is still too intrusive of a change for me. When compiling with clang, this commit increases the generated machine code size by 4.2%, which is not even the main problem: it's the serious amount of code infrastructure & heap-allocated data structures that need to be dropped into the hottest path of pybind11, i.e. the dispatcher. This
seems quite counter to recent efforts of shaving unnecessary nanoseconds from the function dispatcher. The other side of this is that it makes the library harder to understand (even after looking at this patch in many iterations, I am still not confident that I understand all the corner cases that led to the current design. It's really the sheer size of the change that bothers me, so there is no single part that I could point to and say "fix this, and it's mergeable" (which I realize is not so helpful). I was more optimistic at earlier points (sorry!), but this was while only looking at individual portions of the PR.

At this point, I'm not so sure how to proceed. The options I see are to simply close the issue or fundamentally rethink the approach. Perhaps there could be a "lite" version of this patch that addresses an important use case in a minimally intrusive way.

Best,
Wenzel

@jagerman
Copy link
Member Author

jagerman commented Sep 10, 2016

Hi Wenzel,

That's a fair assessment, I'll close the PR.

I'll leave the branch around, in case anyone ever wants to revisit this (or something similar) in the future, but I'm okay with just forgetting about it and adding a couple work-arounds for the cases where I use it.

Incidentally, one of the comments that has come up in this and several other issues/PRs relates to multiple inheritance. Just a thought: it would be nice to have an open "collector" issue to capture the bits and pieces that relate and could potentially contribute to this, such as the experimental upcast walker I played with as a tangent of this PR (44f20dc), and (BorisSchaeling/pybind11@8158622). It could serve as a long-term discussion area for multiple inheritance support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants